xen.git
16 years agoxend: Allow vtpm instance uuid to be specified on domain creation
Keir Fraser [Mon, 24 Aug 2009 07:05:46 +0000 (08:05 +0100)]
xend: Allow vtpm instance uuid to be specified on domain creation

Right now xen will create a new vtpm instance everytime you start up a
domU, even if you specify the instance parameter in your config file.
Each vtpm instance is then given a uuid and the vtpm.db file maps
instance numbers to uuid numbers.

This patch is a hack that lets you explicitly set the uuid of your
vtpm instance. Everytime you boot up your domU now the vtpm will get
that uuid and thus it will always get the same vtpm instance number
instead of being generated a new one.

So for example, in your config file you would do something like this
vtpm = [ 'backend=0,uuid=dcdb124b-9fed-4040-b149-dd2dfd8d094c' ]

Signed-off-by: Matt Fioravante <Matthew.Fioravante@jhuapl.edu>
16 years agovtpm: Fix hashed-memory file writing.
Keir Fraser [Mon, 24 Aug 2009 07:03:46 +0000 (08:03 +0100)]
vtpm: Fix hashed-memory file writing.

There is a bug in the vtpm_manager that has to do with hashing and
saving the NVM memory files (vtpm_dm_%d.data). The file is not
truncated when it is written and this results in the hash becoming
invalid because of the extra bits at the end of the file.

This patch adds O_TRUNC to the flags when opening the file.

More details on this issue are in the bug report on bugzilla=20
http://bugzilla.xensource.com/bugzilla/show_bug.cgi?id=3D1488

Signed-off-by: Matt Fioravante <Matthew.Fioravante@jhuapl.edu>
16 years agox86: run timers when populating Dom0's P2M table
Keir Fraser [Mon, 24 Aug 2009 07:02:08 +0000 (08:02 +0100)]
x86: run timers when populating Dom0's P2M table

When booting Dom0 with huge amounts of memory, and/or memory accesses
being sufficiently slow (due to NUMA effects), and the ACPI PM timer
or a high frequency HPET being used, the time it takes to populate the
M2P table may significantly exceed the overflow time of the platform
timer, screwing up time management to the point where Dom0 boot fails.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: Ensure irq is disabled before taking vector_lock.
Keir Fraser [Fri, 21 Aug 2009 16:14:35 +0000 (17:14 +0100)]
x86: Ensure irq is disabled before taking vector_lock.

Fixed debug lock issue for taking vector lock.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoia64: Fix ia64 build issue introduced by per-cpu vector changes.
Keir Fraser [Fri, 21 Aug 2009 16:13:54 +0000 (17:13 +0100)]
ia64: Fix ia64 build issue introduced by per-cpu vector changes.

ia64 has no per-cpu vector support, so change the related APIs back
through defining macros.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoUpdate .hgignore for tools/libxc/.zlib.deps
Keir Fraser [Fri, 21 Aug 2009 16:13:17 +0000 (17:13 +0100)]
Update .hgignore for tools/libxc/.zlib.deps

16 years agodocs/misc: Update XSM Flask documentation
Keir Fraser [Fri, 21 Aug 2009 16:12:13 +0000 (17:12 +0100)]
docs/misc: Update XSM Flask documentation

Update the XSM Flask documentation to reflect the support for
policy.24, the updated policy and policy build infrastructure, and how
to enable the optional MLS policy.

Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: George S. Coker, II <gscoker@alpha.ncsc.mil>
16 years agopygrub: Fix elilo handling after password patch.
Keir Fraser [Fri, 21 Aug 2009 16:11:40 +0000 (17:11 +0100)]
pygrub: Fix elilo handling after password patch.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agoRevert 20105:979fd420311b
Keir Fraser [Fri, 21 Aug 2009 16:00:01 +0000 (17:00 +0100)]
Revert 20105:979fd420311b

16 years agolibxc: Remove minios-specific hack for generating .zlib.deps file
Keir Fraser [Fri, 21 Aug 2009 10:10:49 +0000 (11:10 +0100)]
libxc: Remove minios-specific hack for generating .zlib.deps file

It's not needed if one relative path is replaced.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agolibxenguest: Fix libbz2/liblzma dependency computation.
Keir Fraser [Thu, 20 Aug 2009 21:26:16 +0000 (22:26 +0100)]
libxenguest: Fix libbz2/liblzma dependency computation.

 1. Create an empty dep file if neither lib is installed
 2. Forcibly disable support for libs if building minios

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agodomain builder: Implement bzip2 and LZMA loaders
Keir Fraser [Thu, 20 Aug 2009 21:12:25 +0000 (22:12 +0100)]
domain builder: Implement bzip2 and LZMA loaders

Recent upstream kernels can be compressed using either gzip,
bzip2, or LZMA.  However, the PV kernel loader in Xen currently only
understands gzip, and will fail on the other two types.  The attached
patch implements kernel decompression for gzip, bzip2, and LZMA so
that kernels compressed with any of these methods can be launched.

Signed-off-by: Chris Lalancette <clalance@redhat.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotools/flask/policy: Updates to policy and policy build infrastructure
Keir Fraser [Thu, 20 Aug 2009 20:15:24 +0000 (21:15 +0100)]
tools/flask/policy: Updates to policy and policy build infrastructure

The original xen policy infrastructure was based off of an early
version of refpolicy. Because of this there was a lot of cruft that
does not apply to building a policy for xen. This patch does several
things. First it cleans up the makefile as to remove many unnecessary
build targets. Second it fixes an issue that the policy build process
wasn't handling interface files properly. Third it pulls in the MLS
suppport functions from current ref policy and makes use of
them. Finally it updates the xen policy with new rules to address
changes in xen since the policy was last worked on, and provides
several new abstractions for creating domains.

Signed-off-by: David P. Quigley <dpquigl@tycho.nsa.gov>
16 years agox86_64 hvm: Adjust COMPAT_VIRT_START for 32-bit HVM guests.
Keir Fraser [Thu, 20 Aug 2009 17:27:31 +0000 (18:27 +0100)]
x86_64 hvm: Adjust COMPAT_VIRT_START for 32-bit HVM guests.

The PV limit should not apply as there is no M2P table mapped into an
HVM guest's virtual address space.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxm-test: Fix testcase '11_block_attach_shared_dom0' for up-to date
Keir Fraser [Thu, 20 Aug 2009 15:19:01 +0000 (16:19 +0100)]
xm-test: Fix testcase '11_block_attach_shared_dom0' for up-to date
linux kernels

New kernels have ext2 disabled by default.  This fix uses ext3 for
testcase 11_block_attach_shared_dom0.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agopygrub: Add password support
Keir Fraser [Thu, 20 Aug 2009 15:17:16 +0000 (16:17 +0100)]
pygrub: Add password support

It basically checks for the presence of password line in grub.conf
of the guest image and if this line is present, it supports both clear
text and md5 versions of the password. Editing the grub entries and
command-line are disabled when some password is set in domain's
grub.conf file but the password was not entered yet. Also, new option
to press 'p' in interactive pygrub has been added to allow entering
the grub password. It's been tested on x86_64 with PV guests and was
working fine. Also, the countdown has been stopped after key was
pressed, ie. the user is probably editing the boot configuration.

Signed-off-by: Michal Novotny <minovotn@redhat.com>
16 years agox86: shadow_alloc_p2m_page() should call shadow_prealloc() before shadow_alloc()
Keir Fraser [Thu, 20 Aug 2009 15:15:52 +0000 (16:15 +0100)]
x86: shadow_alloc_p2m_page() should call shadow_prealloc() before shadow_alloc()

shadow_alloc_p2m_page() fails to call shadow_prealloc() before calling
shadow_alloc().  In certain conditions, notably when PoD is being
exercised, this may cause shadow_alloc() to fail, crashing Xen.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
16 years agox86 vmx: Update EIP when appropriate during task switch
Keir Fraser [Thu, 20 Aug 2009 12:32:31 +0000 (13:32 +0100)]
x86 vmx: Update EIP when appropriate during task switch

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoFix xapi xm-tests.
Keir Fraser [Thu, 20 Aug 2009 09:30:53 +0000 (10:30 +0100)]
Fix xapi xm-tests.

There were a couple of small bugs in the xapi xm-test:
o outdated XenAPI calls were removed from testcase
  (02_xapi-vbd_basic)
o minor problem with XendLocalStorageRepository
  is fixed (missed list_images() function - which
  is moved from the XenQCoWStroageRepo to the common
  base class XendStorageRepository)
  which was detected running 02_xapi-vbd_basic.
o XenAPI session handling and connecting is fixed.
o 03_xapi-network_pos was rewritten and now uses
  XenAPI.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agoxm-test: Add status section to xm-test/README
Keir Fraser [Thu, 20 Aug 2009 09:27:37 +0000 (10:27 +0100)]
xm-test: Add status section to xm-test/README

The resport functionality is not removed because there is the hope
that somebody sets up the server side infrastructure.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agox86: Remove global percpu_mm_info structure, to make dataflow through
Keir Fraser [Thu, 20 Aug 2009 09:16:58 +0000 (10:16 +0100)]
x86: Remove global percpu_mm_info structure, to make dataflow through
mm code clearer.

The FOREIGNDOM method was just confusing and pointless. The deferred
TLB flushing is of questionable value now that much automatic flushing has to be
synchronous to avoid guest SMP races.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: teardown_msi_irq is not needed.
Keir Fraser [Thu, 20 Aug 2009 07:26:51 +0000 (08:26 +0100)]
x86: teardown_msi_irq is not needed.

teardown_msi_irq logic is covered in destroy_irq,
so remove it to avoid freeing msi resource twice.

Signed-off-by: Xiantao Zhang<xiantao.zhang@intel.com>
16 years agox86: calculate nr_irqs_gsi correctly.
Keir Fraser [Thu, 20 Aug 2009 07:26:16 +0000 (08:26 +0100)]
x86: calculate nr_irqs_gsi correctly.

Should be a typo, this issue is introduced by Cset20076,
and it may break VT-d device assignment.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agoxend: Fix error caused by VT-d ACS patch.
Keir Fraser [Thu, 20 Aug 2009 07:25:41 +0000 (08:25 +0100)]
xend: Fix error caused by VT-d ACS patch.

Signed-off-by: Allen Kay <allen.m.kay@intel.com>
16 years agopygrub: Revert 19322:3118041f2259, as it breaks timeout=0 behaviour
Keir Fraser [Thu, 20 Aug 2009 07:23:33 +0000 (08:23 +0100)]
pygrub: Revert 19322:3118041f2259, as it breaks timeout=0 behaviour

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Fix arch/x86/xen.lds dependencies.
Keir Fraser [Wed, 19 Aug 2009 16:00:26 +0000 (17:00 +0100)]
x86: Fix arch/x86/xen.lds dependencies.

gcc can get the dependency target name wrong (appends .o).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoAMD IOMMU: support "passthrough" and "no-intremap" parameters.
Keir Fraser [Wed, 19 Aug 2009 13:23:30 +0000 (14:23 +0100)]
AMD IOMMU: support  "passthrough" and "no-intremap" parameters.

Signed-off-by: Wei Wang <wei.wang2@amd.com>
16 years agoUpdate Xen Flask module to policy.24.
Keir Fraser [Wed, 19 Aug 2009 13:22:52 +0000 (14:22 +0100)]
Update Xen Flask module to policy.24.

This is a back-port of the latest SELinux code to Xen, adjusted
for Xen coding style and interfaces.  Unneeded functionality such
as most object context config data, handle_unknown, MLS field
defaulting, etc has been omitted.

Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: George S. Coker, II <gscoker@alpha.ncsc.mil>
16 years agoxen-hvmctx: don't compile for ia64.
Keir Fraser [Wed, 19 Aug 2009 13:22:15 +0000 (14:22 +0100)]
xen-hvmctx: don't compile for ia64.

xen-hvmctx is a x86 specific tool so that it shouldn't compile for ia64.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years ago[IA64] define BYTES_PER_LONG to fix compilation error.
Keir Fraser [Wed, 19 Aug 2009 13:21:56 +0000 (14:21 +0100)]
[IA64] define BYTES_PER_LONG to fix compilation error.

Signed-off-by: Isaku Yamahata <yamahata@valinux.co.jp>
16 years agox86 hvm: Clean up vlapic/vioapic/vmsi delivery.
Keir Fraser [Wed, 19 Aug 2009 13:13:52 +0000 (14:13 +0100)]
x86 hvm: Clean up vlapic/vioapic/vmsi delivery.

In particular, avoid intermediate delivery bitmaps which restrict
number of vcpus supported.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxen pm trace utility cleanup
Keir Fraser [Wed, 19 Aug 2009 12:17:41 +0000 (13:17 +0100)]
xen pm trace utility cleanup

xenpm trace utility gtraceview cleanup

- add gtraceview help info on how to get raw data by xentrace
- make trace_exit_reason compiled in non-debug mode. trace_exit_reason
  can be enable/disabled by xentrace at runtime, so no need to disable
  it at build time.

Signed-off-by: Yu Ke <ke.yu@intel.com>
16 years agox86 hvm: Remove vendor-specific feature masking of 0x1:ECX.
Keir Fraser [Wed, 19 Aug 2009 12:16:50 +0000 (13:16 +0100)]
x86 hvm: Remove vendor-specific feature masking of 0x1:ECX.

Vendors are respecting each others bits.

Signed-off-by: Andre Przywara <andre.przywara@amd.com>
16 years agoxend: passthrough: check if a device is behind PCIe switch that lacks ACS
Keir Fraser [Wed, 19 Aug 2009 12:12:16 +0000 (13:12 +0100)]
xend: passthrough: check if a device is behind PCIe switch that lacks ACS

Imagine a PCIe switch, which doesn't support ACS (Access Control
Services), has 2 downstream ports: A and B, according to PCIe spec,
the PCIe switch should directly route the transaction that is from A
and to a device under B -- the Root Complex and IOMMU engine are
bypassed -- this doesn't work at all in the case of hvm guest and can
even incur potential security issue, so we should not allow such kind
of device assignment.

If all the intermediate PCIe swiches between a device and Root Complex
support and enable ACS, we can safely asssign the device to guest.

Cc: Allen Kay <allen.m.kay@intel.com>
Signed-off-by: Dexuan Cui <dexuan.cui@intel.com>
16 years agohotplug scripts: better same_vm checks
Keir Fraser [Wed, 19 Aug 2009 12:11:33 +0000 (13:11 +0100)]
hotplug scripts: better same_vm checks

currently the function same_vm in block-common.sh is the one
responsible for detecting if two block devices can be used at the same
time by two VMs. This can be allowed in few specific cases: when the
two VMs are actually the same VM and when the two VMs are the guest
and its stubdomain. We need to expand these exceptions to handle
properly save restore issues: this patch adds to the exceptions the
case when two VMs are the same VM because of save\restore races, and
when two VMs are the guest and the stubdomain of the previous guest,
again during save\restore.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agox86: miscellaneous emulator adjustments
Keir Fraser [Wed, 19 Aug 2009 12:02:31 +0000 (13:02 +0100)]
x86: miscellaneous emulator adjustments

Defer fail_if()-s as much as possible (in favor of possibly generating
exceptions), and avoid generating exceptions when not strictly
necessary.

Avoid fail_if()-s for simple return code checks (making the code that
used them consistent with other, longer existing code).

Eliminate redundant generate_exception_if()-s checking lock_prefix
(which is already covered by the general check prior to decoding
operands).

Also fix the testing code to add PROT_EXEC for the mapping that is
intended to have instruction executed from.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86-64: adjust emulation of control transfers
Keir Fraser [Wed, 19 Aug 2009 12:02:04 +0000 (13:02 +0100)]
x86-64: adjust emulation of control transfers

While Intel and AMD implementations differ in various respects when
it comes to non-default operand sizes of control transfer instructions
and segment register loads (lfs, lgs, lss), it seems to make senss to
(a) match their behavior if they agree and (b) prefer the more
permissive behavior if they don't agree:

- honor operand size overrides on near brances (AMD does, Intel
  doesn't)
- honor operand size overrides on far branches (both Intel and AMD do)
- honor REX.W on far branches (Intel does, AMD doesn't except on far
  returns)
- honor REX.W on lfs, lgs, and lss (Intel does, AMD doesn't)

Also, do not permit emulation of pushing/popping segment registers
other than fs and gs as well as that of les and lds (the latter are
particularly important due to the re-use of the respective opcodes as
VEX prefixes in AVX).

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: extend runstate area updates
Keir Fraser [Wed, 19 Aug 2009 12:01:41 +0000 (13:01 +0100)]
x86: extend runstate area updates

In order to give guests a hint at whether their vCPU-s are currently
scheduled (so they can e.g. adapt their behavior in spin loops),
update
the run state area (if registered) also when de-scheduling a vCPU.

Also fix an oversight in the compat mode implementation of
VCPUOP_register_runstate_memory_area.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
16 years agox86: Fix max_gsi calculation on systems with discontiguous GSI space.
Keir Fraser [Wed, 19 Aug 2009 11:58:15 +0000 (12:58 +0100)]
x86: Fix max_gsi calculation on systems with discontiguous GSI space.

From: Steven Smith <steven.smith@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxm,xend: Remove tab indents
Keir Fraser [Wed, 19 Aug 2009 11:55:15 +0000 (12:55 +0100)]
xm,xend: Remove tab indents

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86: Only allocate vpid for initialised vcpus.
Keir Fraser [Wed, 19 Aug 2009 11:54:43 +0000 (12:54 +0100)]
x86: Only allocate vpid for initialised vcpus.

Currently, 32 vpids are allocated for each
domain statically, it blocks to support more
vcpus for HVM domain, so remove the limit and
only allocate vpid for intilized vcpus. In this
way, vpid can be non-contiguous for vcpus of one
single domain.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Implement per-cpu vector for xen hypervisor
Keir Fraser [Wed, 19 Aug 2009 11:53:46 +0000 (12:53 +0100)]
x86: Implement per-cpu vector for xen hypervisor

Since Xen and Linux has big differece in code base, it
is very hard to port Linux's patch and apply it to Xen
directly, so this patch only adopts core logic of Linux,
and make it work for Xen.

Key changes:
1. vector allocation algorithm
2. all IRQ chips' set_affinity logic
3. IRQ migration when cpu hot remove.
4. Break assumptions which depend on global vector policy.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Change Xen hypervisor's interrupt infrastructure
Keir Fraser [Wed, 19 Aug 2009 11:53:04 +0000 (12:53 +0100)]
x86:  Change Xen hypervisor's interrupt infrastructure
from vector-based to IRQ-based.

In per-cpu vector environment, vector space changes to
multi-demension resource, so vector number is not appropriate
to index irq_desc which stands for unique interrupt source. As
Linux does, irq number is chosen to index irq_desc. This patch
changes vector-based interrupt infrastructure to irq-based one.
Mostly, it follows upstream linux's changes, and some parts are
adapted for Xen.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agox86: Change nr_irqs to nr_irqs_gsi.
Keir Fraser [Wed, 19 Aug 2009 11:52:38 +0000 (12:52 +0100)]
x86: Change nr_irqs to nr_irqs_gsi.

Currently, nr_irqs is only used for GSI irqs, change
the name to make its meaning more precise. And, also
this is the initial step to support irq allocation for
MSI interrupt source.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
16 years agogdbstub: Remove noisy message on every gdbstub entry.
Keir Fraser [Sun, 16 Aug 2009 07:46:08 +0000 (08:46 +0100)]
gdbstub: Remove noisy message on every gdbstub entry.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agostubdoms: parse bridge informations
Keir Fraser [Sun, 16 Aug 2009 07:45:04 +0000 (08:45 +0100)]
stubdoms: parse bridge informations

Currently the stubdom-dm script doesn't read the bridge of a vif
on xenstore, therefore all the vifs assigned to the stubdom always
belong to default bridge. This patch changes the behavior reading the
bridge from xenstore and adding the bridge to the stubdom config
file.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
16 years agoRevert 20066:135b350496fb
Keir Fraser [Sun, 16 Aug 2009 07:43:50 +0000 (08:43 +0100)]
Revert 20066:135b350496fb

16 years agoxen-hvmctx: a tool to print the HVM state of a running domain
Keir Fraser [Fri, 14 Aug 2009 16:26:23 +0000 (17:26 +0100)]
xen-hvmctx: a tool to print the HVM state of a running domain

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: VBD QoS policy bits
Keir Fraser [Fri, 14 Aug 2009 16:10:11 +0000 (17:10 +0100)]
xend: VBD QoS policy bits

Add the ability to define VBD QoS policy in the xend layer.

Consider the following vbd entry:

vbd = [
   'phy:/dev/server/virtualmachine1-disk,xvda1,w,credit=3D5000/s@50ms',
]

This means that a VM may perform 5000 I/O operations per second, with
credit being replenished every 50 milliseconds.

The 'credit' xenstore value is by the blkback driver to ratelimit I/O
operations for the specific device.

Signed-off-by: William Pitcock <nenolod@dereferenced.org>
16 years agox86 mce: move mce quirks into separate files
Keir Fraser [Fri, 14 Aug 2009 16:09:39 +0000 (17:09 +0100)]
x86 mce: move mce quirks into separate files
Quirk handling is designed to easily add more quirks when needed
w/o messing around in the normal mce code.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoxsm/flask: Fix AVC audit message format
Keir Fraser [Fri, 14 Aug 2009 16:08:38 +0000 (17:08 +0100)]
xsm/flask:  Fix AVC audit message format

Fix formatting of Flask AVC audit messages so that existing
policy tools can parse them.  After applying,
'xm dmesg | audit2allow' yields the expected result.

Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: George S. Coker, II <gscoker@alpha.ncsc.mil>
16 years agoxsm/flask: Fix sidtab locking bug
Keir Fraser [Fri, 14 Aug 2009 16:08:12 +0000 (17:08 +0100)]
xsm/flask:  Fix sidtab locking bug

We do not need to use the _irqsave/irqrestore forms of spin locking
within the sidtab in Xen's XSM Flask module, and doing so triggers a
BUG_ON() within check_lock() when we subsequently call xmalloc().
This was preventing Xen from booting with XSM/Flask enabled if built
with debug=y. It appears that this broke upon the changes to xmalloc
in changeset 18379:14a9a1629590.

Signed-off-by: Stephen D. Smalley <sds@tycho.nsa.gov>
Signed-off-by: George S. Coker, II <gscoker@alpha.ncsc.mil>
16 years agoAMD IOMMU: Destroy passthru guests when IO pagetable allocation fails
Keir Fraser [Fri, 14 Aug 2009 16:07:23 +0000 (17:07 +0100)]
AMD IOMMU: Destroy passthru guests when IO pagetable allocation fails

Signed-off-by: Wei Wang <wei.wang2@amd.com>
Acked-by: Wei Huang <wei.huang2@amd.com>
16 years agox86: cleanup rdmsr/wrmsr
Keir Fraser [Fri, 14 Aug 2009 11:26:35 +0000 (12:26 +0100)]
x86: cleanup rdmsr/wrmsr

Use a 64bit value instead of extracting/merging two 32bit values.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 mce: make debug messages less noisy
Keir Fraser [Fri, 14 Aug 2009 09:59:13 +0000 (10:59 +0100)]
x86 mce: make debug messages less noisy

On guest MCE read only print debug code when
a non-zero value has been read. Xen is too
noisy, otherwise.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agoVMX: issue an NMI rather than just calling the NMI handler
Keir Fraser [Fri, 14 Aug 2009 09:58:32 +0000 (10:58 +0100)]
VMX: issue an NMI rather than just calling the NMI handler
when the VMEXIT code indicates that an NMI has been raised.
Otherwise we might hit a real NMI while in the handler.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
16 years agohvm: handle access to MSR_AMD64_NB_CFG
Keir Fraser [Fri, 14 Aug 2009 09:57:24 +0000 (10:57 +0100)]
hvm: handle access to MSR_AMD64_NB_CFG

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agox86: Remove EF_* duplicate defs for X86_EFLAGS_*.
Keir Fraser [Fri, 14 Aug 2009 07:36:12 +0000 (08:36 +0100)]
x86: Remove EF_* duplicate defs for X86_EFLAGS_*.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: Do not clear EF.TF in crash-debug mode.
Keir Fraser [Fri, 14 Aug 2009 07:22:34 +0000 (08:22 +0100)]
x86: Do not clear EF.TF in crash-debug mode.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agogdbstub: Fix the build and make a few cleanups.
Keir Fraser [Thu, 13 Aug 2009 07:40:39 +0000 (08:40 +0100)]
gdbstub: Fix the build and make a few cleanups.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agogdbstub: Small fixes.
Keir Fraser [Wed, 12 Aug 2009 13:27:52 +0000 (14:27 +0100)]
gdbstub: Small fixes.

 * Correctly handly EFLAGS.TF in the hypervisor
 * Register value sent with 'P' command is in native byte order.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 numa: fix nodes' memory parsing when SRAT table includes future-hotplug memory...
Keir Fraser [Wed, 12 Aug 2009 13:16:09 +0000 (14:16 +0100)]
x86 numa: fix nodes' memory parsing when SRAT table includes future-hotplug memory range

A node's future-hotplug memory range starts from very high end
normally, e.g. 1TB, and is not continuous with its current existing
memory range. It should not be covered by the global variable 'nodes'
as it assumes the node's memory is continuous. Otherwise it can make
nodes' memory ranges become very big and overlapped, and
populate_memnodemap() fails.

We can ignore future-hotplug memory range for now. Physical memory
hotplug support in future will handle it.

Signed-off-by: Yang Xiaowei <xiaowei.yang@intel.com>
16 years agox86 svm: Fix PAT MSR handling when using Nested Paging.
Keir Fraser [Wed, 12 Aug 2009 13:13:54 +0000 (14:13 +0100)]
x86 svm: Fix PAT MSR handling when using Nested Paging.

Accesses to the MSR should not be intercepted.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 svm: Fix the build: vlapic_get_reg() takes two arguments.
Keir Fraser [Wed, 12 Aug 2009 13:13:00 +0000 (14:13 +0100)]
x86 svm: Fix the build: vlapic_get_reg() takes two arguments.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotmem: one-liner correcting stat parsing ordering
Keir Fraser [Wed, 12 Aug 2009 13:06:30 +0000 (14:06 +0100)]
tmem: one-liner correcting stat parsing ordering

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agox86 svm: Fix checked builds of Windows running on AMD SVM
Keir Fraser [Wed, 12 Aug 2009 13:06:01 +0000 (14:06 +0100)]
x86 svm: Fix checked builds of Windows running on AMD SVM

Checked builds of Windows will, after every modification of the TPR,
read it back again and assert that the value read back matches with
the value written, including the priority sub-class.  Make sure that
we correctly preserve it on vmexit.

As far as I can tell from reading the documentation, the sub-class
doesn't actually do anything, so this should be pretty harmless.

Signed-off-by: Steven Smith <steven.smith@eu.citrix.com>
16 years agoxentrace: fix "%016x" format
Keir Fraser [Tue, 11 Aug 2009 06:36:26 +0000 (07:36 +0100)]
xentrace: fix "%016x" format

xentrace_format cannot use "0x016x" format as we expect.
It show only %(N) as "0x016x" format, not as "%(N+1)08x%(N)08x".
So I fixed tools/xentrace/formats by using "%(N+1)08x%(N)08x" format.
Also I added some TRC_PV entries.

Signed-off-by: Akio Takebe <takebe_akio@jp.fujitsu.com>
16 years agolibxc: Include private Xen headers in stubdom libxc build
Keir Fraser [Tue, 11 Aug 2009 06:34:55 +0000 (07:34 +0100)]
libxc: Include private Xen headers in stubdom libxc build

The headers libelf.h and elfstructs.h were removed from
xen/include/public in 19011:7df072566b8c.  But this broke the stubdom
build because parts of libxc depend on them.  This patch adds
$(XEN_ROOT)/xen/include/xen to the stubdom -I path.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
16 years agoUpdate QEMU_TAG to a83d119cfcc20bc7edb427992d6e31b3e99430be
Keir Fraser [Mon, 10 Aug 2009 17:15:19 +0000 (18:15 +0100)]
Update QEMU_TAG to a83d119cfcc20bc7edb427992d6e31b3e99430be

16 years agoRevert alloc_idle_vcpu() to support multiple idle domains where max
Keir Fraser [Mon, 10 Aug 2009 12:51:28 +0000 (13:51 +0100)]
Revert alloc_idle_vcpu() to support multiple idle domains where max
vcpus is less than max pcpus (e.g., can happen on i386).

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: make mce debug output more verbose
Keir Fraser [Mon, 10 Aug 2009 12:33:01 +0000 (13:33 +0100)]
x86: make mce debug output more verbose

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agox86: Remove cpumask.h inclusion from mm.h
Keir Fraser [Mon, 10 Aug 2009 12:32:02 +0000 (13:32 +0100)]
x86: Remove cpumask.h inclusion from mm.h

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agopygrub: Remove bogus log.debug line.
Keir Fraser [Mon, 10 Aug 2009 12:30:50 +0000 (13:30 +0100)]
pygrub: Remove bogus log.debug line.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotmem: expose freeable memory
Keir Fraser [Mon, 10 Aug 2009 12:27:54 +0000 (13:27 +0100)]
tmem: expose freeable memory

Expose tmem "freeable" memory for use by management tools.

Management tools looking for a machine with available
memory often look at free_memory to determine if there
is enough physical memory to house a new or migrating
guest.  Since tmem absorbs much or all free memory,
and since "ephemeral" tmem memory can be synchronously
freed, management tools need more data -- not only how
much memory is "free" but also how much memory is
"freeable" by tmem if tmem is told (via an already
existing tmem hypercall) to relinquish freeable memory.
This patch provides that extra piece of data (in MB).

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agotools: Fix iptables failure test in vif-common.sh
Keir Fraser [Fri, 7 Aug 2009 16:31:27 +0000 (17:31 +0100)]
tools: Fix iptables failure test in vif-common.sh

In changset 19540 a bug was introduced in the fib_iptable function in
vif-common.sh that incorrectly checks the exit status of iptables --
it always believes iptables has failed even when it hasn't.

The attached patch fixes that.  It's also bug 1490.

Signed-off-by: John Haxby <john.haxby@oracle.com>
16 years agox86: replace PAT initialisation magic value with a #define
Keir Fraser [Fri, 7 Aug 2009 16:30:33 +0000 (17:30 +0100)]
x86: replace PAT initialisation magic value with a #define

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
16 years agox86: Increase default max CPUs to 64.
Keir Fraser [Fri, 7 Aug 2009 16:29:50 +0000 (17:29 +0100)]
x86: Increase default max CPUs to 64.

Also remove compile-time limit of 32 for i386. It is no longer
required, since a cpumask was moved out of struct page_info.

Signed-off-by: Wei Gang <gang.wei@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86 p2m: use common p2m ops in common p2m code path
Keir Fraser [Fri, 7 Aug 2009 16:23:11 +0000 (17:23 +0100)]
x86 p2m: use common p2m ops in common p2m code path

We found recently there was an assertion failure when EPT mode is
enabled on 32PAE host when debug=y is used. The patch attached fixes
that. It uses the common p2m ops in the
common p2m code path p2m_remove_page rather than calling
p2m_gfn_to_mfn() for only shadow mode.

Signed-off-by: Xin, Xiaohui <xiaohui.xin@intel.com>
16 years agoxend: Rename device backend value when xm save/migrate
Keir Fraser [Fri, 7 Aug 2009 16:22:04 +0000 (17:22 +0100)]
xend: Rename device backend value when xm save/migrate

The Xend has a problem that it often fails to restore/migrate
a PV domain whose device backends are partly a driver domain.

Because a checkpoint of the PV domain has device backend value as
domain id, you can restore/migrate the PV domain only when a driver
domain is the same id as device backend value in the checkpoint.

I attached a patch to fix it by renaming device backend value in a
checkpoint from domain id to domain name when xm save/migrate.

This patch doesn't rename device backend value if the value is 0,
which is Domain-0, so the checkpoint format is compatible if you use
only Domain-0 as device backend.

Signed-off-by: Rikiya Ayukawa <ayukawa.rikiya@jp.fujitsu.com>
16 years agox86_emulate: Fixes for 'mov rm16,sreg'
Keir Fraser [Fri, 7 Aug 2009 09:53:22 +0000 (10:53 +0100)]
x86_emulate: Fixes for 'mov rm16,sreg'

1. Memory reads should be 16 bits only
2. Attempt to load %cs should result in #UD

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86_emulate: protmode_load_seg() cannot load system segments in long mode.
Keir Fraser [Fri, 7 Aug 2009 08:54:43 +0000 (09:54 +0100)]
x86_emulate: protmode_load_seg() cannot load system segments in long mode.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agohvmloader: Regression tests need 16MB to run. Check for this.
Keir Fraser [Thu, 6 Aug 2009 10:14:48 +0000 (11:14 +0100)]
hvmloader: Regression tests need 16MB to run. Check for this.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoept: code clean up and formatting.
Keir Fraser [Thu, 6 Aug 2009 09:02:20 +0000 (10:02 +0100)]
ept: code clean up and formatting.

Fix alignment and comments and add and remove spaces and lines where
appropriate.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
16 years agox86_emulate: Remove cmpxchg retry loop from protmode_load_seg().
Keir Fraser [Thu, 6 Aug 2009 08:54:22 +0000 (09:54 +0100)]
x86_emulate: Remove cmpxchg retry loop from protmode_load_seg().

It is safer to retry in a loop via the caller.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotmem: Remove bogus variable decl, fixing build.
Keir Fraser [Thu, 6 Aug 2009 08:53:37 +0000 (09:53 +0100)]
tmem: Remove bogus variable decl, fixing build.

Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agotmem: save/restore/migrate/livemigrate and shared pool authentication
Keir Fraser [Thu, 6 Aug 2009 08:19:55 +0000 (09:19 +0100)]
tmem: save/restore/migrate/livemigrate and shared pool authentication

Attached patch implements save/restore/migration/livemigration
for transcendent memory ("tmem").  Without this patch, domains
using tmem may in some cases lose data when doing save/restore
or migrate/livemigrate.  Also included in this patch is
support for a new (privileged) hypercall for authorizing
domains to share pools; this provides the foundation to
accomodate upstream linux requests for security for shared
pools.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
16 years agoept mtrr: replace unsigned long with mfn_t for mfns.
Keir Fraser [Thu, 6 Aug 2009 08:15:42 +0000 (09:15 +0100)]
ept mtrr: replace unsigned long with mfn_t for mfns.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
16 years agoept p2m: replace unsigned long with mfn_t for mfns.
Keir Fraser [Thu, 6 Aug 2009 08:15:24 +0000 (09:15 +0100)]
ept p2m: replace unsigned long with mfn_t for mfns.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
16 years agoept p2m: set rwx flags to 0 for invalid and mmio_dm types.
Keir Fraser [Thu, 6 Aug 2009 08:14:52 +0000 (09:14 +0100)]
ept p2m: set rwx flags to 0 for invalid and mmio_dm types.

Read/write/execute flags are set to 1 before calling the type_to_flags
function which sets them to their appropriate values depending on the
p2m type. However, in invalid, mmio_dm, and default/unknown cases in
type_to_flags just falls through, unsafely leaving full access to
these pages.

Signed-off-by: Patrick Colp <Patrick.Colp@citrix.com>
16 years agoRevert 20006:edf21ab7d7a4 and 20023:2b28320c6f8c.
Keir Fraser [Wed, 5 Aug 2009 13:56:29 +0000 (14:56 +0100)]
Revert 20006:edf21ab7d7a4 and 20023:2b28320c6f8c.

16 years agoRevert to pulling QEMU GIT repo via HTTP.
Keir Fraser [Wed, 5 Aug 2009 13:39:46 +0000 (14:39 +0100)]
Revert to pulling QEMU GIT repo via HTTP.
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agoxend: Remove _setSchedParams
Keir Fraser [Wed, 5 Aug 2009 13:03:38 +0000 (14:03 +0100)]
xend: Remove _setSchedParams

Currently, xc.sched_credit_domain_set is called twice when domains
are created.

start@XendDomainInfo
  _constructDomain
    xc.sched_credit_domain_set  --- 1st
  _initDomain
    _setSchedParams
      domain_sched_credit_set
        xc.sched_credit_domain_set  --- 2nd

resume@XendDomainInfo
  _constructDomain
    xc.sched_credit_domain_set  --- 1st
  _setSchedParams
    domain_sched_credit_set
      xc.sched_credit_domain_set  --- 2nd

This patch removes _setSchedParams method added by changeset 19955,
because xc.sched_credit_domain_set was added into _constructDomain
method by changeset 20006.

Signed-off-by: Masaki Kanno <kanno.masaki@jp.fujitsu.com>
16 years agox86 vmx: Accelerate VLAPIC EOI writes
Keir Fraser [Wed, 5 Aug 2009 13:02:46 +0000 (14:02 +0100)]
x86 vmx: Accelerate VLAPIC EOI writes

Our testing indicates that most apic accesses are eoi writes. This
patch accelerate guest EOI emulation utilizing HW VM Exit
information.

Without this patch, xentrace shows the apci access average tsc costs
is ~7.8k in our case and it down to ~3k with it. We also save 3% cpu
in our case.

From: Yang Zhang <yang.zhang@intel.com>
Signed-off-by: Keir Fraser <keir.fraser@citrix.com>
16 years agox86: CPU synchronization while doing MTRR register update
Keir Fraser [Wed, 5 Aug 2009 12:50:36 +0000 (13:50 +0100)]
x86: CPU synchronization while doing MTRR register update

The current Xen code does not synchronize all the cpus while
initializing MTRR registers when a cpu comes up.=20

As per IA32 SDM vol 3: Section: 10.11.8 MTRR Considerations in MP
Systems, all the processors should be synchronized while updating
MTRRs.

Processors starting with westmere are caching VMCS data for better VMX
performance. These processors also has Hyper-threading support. With
hyper-threading, when one thread's cache is disabled, it also disables
cache for the sibling threads. And MTRR register updating procedure
involves cache disabling. So if cpus are not synchronized, updating
MTRR registers on a thread, results in the VMCS data from sibling
threads becoming inaccessible, and it causes system failure.

With this patch while updating the MTRR registers, all the cpus are
synchronized as per the IA32 SDM. Also at the boot time and resume
time when multiple cpus are brought up, an optimization is added to
delay the MTRR initialization until all the cpus are up, to avoid
multiple times cpu synchronization.

Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com>
Signed-off-by: Suresh B Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Asit K Mallick <asit.k.mallick@intel.com>
16 years agox86: Enable GNTTABOP_copy hypercall for HVMs
Keir Fraser [Wed, 5 Aug 2009 12:49:35 +0000 (13:49 +0100)]
x86: Enable GNTTABOP_copy hypercall for HVMs

This requires plumbing 32-bit compat guests through the compat version
of teh grant-table hypercall.

Signed-off-by: Jayaraman, Bhaskar <Bhaskar.Jayaraman@lsi.com>
16 years agoxm-test restore: use ext3 (instead of ext2) and xvda (instead of hda)
Keir Fraser [Wed, 5 Aug 2009 12:40:21 +0000 (13:40 +0100)]
xm-test restore: use ext3 (instead of ext2) and xvda (instead of hda)

This patch fixes the xm-test restore 04 testcase:
o uses ext3 instead of ext2 - which is not supported by the standard
kernel config
o uses xvdX instead of hdX for disks

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agoxm-test: Disable DEBUG_STACK_USAGE which breaks test cases
Keir Fraser [Wed, 5 Aug 2009 12:39:37 +0000 (13:39 +0100)]
xm-test: Disable DEBUG_STACK_USAGE which breaks test cases

The unnecessary 'used greatest stack depth' messages on the console
breaks xm-test cases by random.  Typically a testcase reads input from
the console and parses it.  When DEBUG_STACK_USAGE is enabled, these
stack usage messages are printed by random - the test case reads this
message, cannot handle it and fails.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agoxm-test: fix network13 test (protocol and extensions)
Keir Fraser [Wed, 5 Aug 2009 12:38:38 +0000 (13:38 +0100)]
xm-test: fix network13 test (protocol and extensions)

Attached there is a patch that fixes the used protocol (was udp - but
nobody was listening...) to icmp echo and added the extension, that
the dom0 and the other guest ips are also pinged.
Because of the many different scenarios (three nested loops) over
packet sizes, two guests and different ip addresses, one run of this
test case takes now about 4.5 minutes.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agoxm-test: Adapt memory setting to up-to-date kernel memory consumption
Keir Fraser [Wed, 5 Aug 2009 12:37:26 +0000 (13:37 +0100)]
xm-test: Adapt memory setting to up-to-date kernel memory consumption

The attached patch fixes xm-test memset 04 that it can be used with up
to date kernels.  The old version sets the memory to 15MByte which is
too low for modern kernels: the oom-killer in this case kills the
login shell of the test-case and init.  Increased the size to 18M
which gives the userspace about 2.5 MByte memory.

Signed-off-by: Andreas Florath <xen@flonatel.org>
16 years agoxm-test: 10_block_attach_detach_multiple_devices fixed
Keir Fraser [Wed, 5 Aug 2009 12:36:24 +0000 (13:36 +0100)]
xm-test: 10_block_attach_detach_multiple_devices fixed

This patch fixes and (re-)enables test 10 of the block-create suite.
The tests by random attach and detach devices to / from a domU and
checks if everything is ok.

Signed-off-by: Andreas Florath <xen@flonatel.org>